Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels
نویسندگان
چکیده
P̃ Ñ learning is the problem of binary classification when training examples may be mislabeled (flipped) uniformly with noise rate ρ1 for positive examples and ρ0 for negative examples. We propose Rank Pruning (RP) to solve P̃ Ñ learning and the open problem of estimating the noise rates. Unlike prior solutions, RP is efficient and general, requiring O(T ) for any unrestricted choice of probabilistic classifier with T fitting time. We prove RP achieves consistent noise estimation and equivalent expected risk as learning with uncorrupted labels in ideal conditions, and derive closed-form solutions when conditions are non-ideal. RP achieves state-of-the-art noise rate estimation and F1, error, and AUC-PR on the MNIST and CIFAR datasets, regardless of noise rates. To highlight, RP with a CNN classifier can predict if a MNIST digit is a one or not with only 0.25% error, and 0.46% error across all digits, even when 50% of positive examples are mislabeled and 50% of observed positive labels are mislabeled negative examples.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملA Learning to Rank from Noisy Data
Learning to Rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is, however, not always true. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels, all make it diffic...
متن کاملA Self-Organizing Neural Network that Learns to Detect and Represent Visual Depth from Occlusion Events
nus paper discusses issues in noise tolerant learning from sensory data. A model driven approach to symbolic learning from noisy data is suggested. Introduction Sensor-driven characteristics of visual objects are rarely noise free and most often quite noisy. "The visual world is noisy. Even well posed visual computations are often numerically unstable, if noise is present in both the scene and ...
متن کاملRobust Loss Functions under Label Noise for Deep Neural Networks
In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate app...
متن کاملLearning Transformations for Clustering and Classification Learning Transformations for Clustering and Classification
A low-rank transformation learning framework for subspace clustering and classification is here proposed. Many high-dimensional data, such as face images and motion sequences, approximately lie in a union of low-dimensional subspaces. The corresponding subspace clustering problem has been extensively studied in the literature to partition such highdimensional data into clusters corresponding to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.01936 شماره
صفحات -
تاریخ انتشار 2017